Fast Nearest Neighbors in Large and Composite Networks

نویسندگان

  • Petko Bogdanov
  • Ambuj K. Singh
چکیده

We address the problem of k Nearest Neighbor (kNN) search in networks using a random walk based proximity measure. Our approach retrieves the exact top neighbors at query time without relying on off-line indexing or summaries of the entire network. This makes it suitable for very large networks, as well as for composite network overlays mixed at query time. We provide scalability and flexibility without compromising the quality of results due to theoretical bound guarantees that we develop and incorporate in our search procedure. We incrementally construct a subgraph of the underlying network, sufficient to obtain the exact top k neighbors. We guide the construction of the relevant subgraph in order to achieve fast refinement of the lower and upper proximity bounds, which in turn enables effective pruning of infeasible candidates. We apply our kNN search algorithm on social, information and biological networks and demonstrate the effectiveness and scalability of our approach. For networks in the order of a million nodes, our method retrieves the exact top 20 using less than 0.2% of the network edges in a fraction of a second on a conventional desktop machine without prior indexing. When employed for nearest neighbors search in composite network overlays, it scales linearly with the number of networks mixed in the overlay.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scalable Nearest Neighbors with Guarantees in Large and Composite Networks

We address the problem of k Nearest Neighbor (kNN) search in networks, according to a random walk proximity measure called Effective Importance. Our approach retrieves the exact top neighbors at query time without relying on off-line indexing or summaries of the entire network. This makes it suitable for very large dynamic networks, as well as for composite network overlays mixed at query time....

متن کامل

Comparison and evaluation of the performance of data-driven models for estimating suspended sediment downstream of Doroodzan Dam

Dams control most of the sediment entering the reservoir by creating static environments. However, sediment leaving the dam depends on various factors such as dam management method, inlet sediment, water height in the reservoir, the shape of the reservoir, and discharge flow. In this research, the amount of suspended sediment of Doroodzan Dam based on a statistical period of 25 years has been i...

متن کامل

Fast Large-Scale Approximate Graph Construction for NLP

Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorith...

متن کامل

A Novel Hybrid Approach for Email Spam Detection based on Scatter Search Algorithm and K-Nearest Neighbors

Because cyberspace and Internet predominate in the life of users, in addition to business opportunities and time reductions, threats like information theft, penetration into systems, etc. are included in the field of hardware and software. Security is the top priority to prevent a cyber-attack that users should initially be detecting the type of attacks because virtual environments are not moni...

متن کامل

A Fast k-Neighborhood Algorithm for Large Point-Clouds

Algorithms that use point-cloud models make heavy use of the neighborhoods of the points. These neighborhoods are used to compute the surface normals for each point, mollification, and noise removal. All of these primitive operations require the seemingly repetitive process of finding the k nearest neighbors of each point. These algorithms are primarily designed to run in main memory. However, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010